A statistical property of multiagent learning based on Markov decision process

نویسندگان

  • Kazunori Iwata
  • Kazushi Ikeda
  • Hideaki Sakai
چکیده

We exhibit an important property called the asymptotic equipartition property (AEP) on empirical sequences in an ergodic multiagent Markov decision process (MDP). Using the AEP which facilitates the analysis of multiagent learning, we give a statistical property of multiagent learning, such as reinforcement learning (RL), near the end of the learning process. We examine the effect of the conditions among the agents on the achievement of a cooperative policy in three different cases: blind, visible, and communicable. Also, we derive a bound on the speed with which the empirical sequence converges to the best sequence in probability, so that the multiagent learning yields the best cooperative result.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiagent Reinforcement Learning in Stochastic Games

We adopt stochastic games as a general framework for dynamic noncooperative systems. This framework provides a way of describing the dynamic interactions of agents in terms of individuals' Markov decision processes. By studying this framework, we go beyond the common practice in the study of learning in games, which primarily focus on repeated games or extensive-form games. For stochastic games...

متن کامل

Planning, Learning and Coordination in Multiagent Decision Processes

There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from single-agent planning and learning can be applied in multiagent settings. We survey a number of different techniques from decision-theoretic planning and reinforcement learning and describe a number of interest...

متن کامل

Multiagent Planning with Bayesian Nonparametric Asymptotics

Autonomous multiagent systems are beginning to see use in complex, changing environments that cannot be completely specified a priori. In order to be adaptive to these environments and avoid the fragility associated with making too many a priori assumptions, autonomous systems must incorporate some form of learning. However, learning techniques themselves often require structural assumptions to...

متن کامل

Exploiting Domain Structure in Multiagent Decision-Theoretic Planning and Reasoning

EXPLOITING DOMAIN STRUCTURE IN MULTIAGENT DECISION-THEORETIC PLANNING AND REASONING MAY 2013 AKSHAT KUMAR B.Tech., INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI M.Sc., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Shlomo Zilberstein This thesis focuses on decision-theoretic reasoning and planning problems that arise when a group of collaborative...

متن کامل

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds

Perkins’ Monte Carlo exploring starts for partially observable Markov decision processes (MCES-P) integrates Monte Carlo exploring starts into a local search of policy space to offer a template for reinforcement learning that operates under partial observability of the state. In this paper, we generalize the reinforcement learning under partial observability to the self-interested multiagent se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE transactions on neural networks

دوره 17 4  شماره 

صفحات  -

تاریخ انتشار 2006